Skip to content

support virtual packages on generic git hosts (Gitea)#587

Open
ganesanviji wants to merge 5 commits intomicrosoft:mainfrom
ganesanviji:feat/genric-host-gitea-private
Open

support virtual packages on generic git hosts (Gitea)#587
ganesanviji wants to merge 5 commits intomicrosoft:mainfrom
ganesanviji:feat/genric-host-gitea-private

Conversation

@ganesanviji
Copy link
Copy Markdown

Description

Add support for installing virtual packages from self-hosted Git services like Gitea. Currently, APM only supports virtual packages (subdirectories) on GitHub. This change enables users with Gitea to install packages from subdirectories within repositories.

Changes:

  • Enhanced virtual package detection in DependencyReference to recognize subdirectory packages on generic Git hosts (any FQDN)
  • Added authenticated raw file downloads for private repositories on generic hosts
  • Updated API endpoint from /api/v3 to /api/v1 for better compatibility with Gitea and other Git services
  • Maintains full backward compatibility with existing GitHub functionality

More details about the changes:
✅ Change 1: Virtual Package Detection (reference.py)

Analysis: This only affects generic Git hosts, not GitHub. Allows subdirectory packages to be detected as virtual even without specific file extensions. Safe because:

GitHub uses separate logic path (is_generic_host = False)
Validation still requires package markers (apm.yml, SKILL.md, etc.) in the subdirectory
No impact on existing GitHub virtual file detection

✅ Change 2: Authenticated Raw Downloads (github_downloader.py)

Analysis: Improves private repo support. Safe because:

Only applies to generic hosts, not GitHub
Falls back to API if raw fails
Uses standard Authorization header format

✅ Change 3: API Endpoint Update

Analysis: Gitea uses /api/v1/, GitHub uses /api/v3/. Safe because:

GitHub still uses /api/v3/
Gitea API v1 is compatible for contents endpoint
Falls back gracefully if endpoint doesn't exist

Motivation:
Enterprise teams using self-hosted Git services (Gitea) cannot currently use APM to install packages from repository subdirectories. This is a significant limitation for organizations that don't use GitHub. These changes enable APM to work seamlessly across all Git hosting platforms.

Type of change

  • New feature
  • Bug fix
  • Documentation
  • Maintenance / refactor

Testing

  • Tested locally

    • Gitea virtual package parsing: PASS
    • GitHub virtual file parsing: PASS (unchanged)
    • Regular repo parsing: PASS (unchanged)
  • All existing tests pass

    • Code validated with custom test cases for Gitea URLs
    • Backward compatibility verified for GitHub usage
  • Added tests for new functionality (if applicable)

    • Validated with multiple test scenarios

@ganesanviji
Copy link
Copy Markdown
Author

@microsoft-github-policy-service agree

@danielmeppiel
Copy link
Copy Markdown
Collaborator

Review Feedback

Thanks @ganesanviji for adding Gitea support! The raw URL download approach is a good idea. A few issues need addressing:

1. API version change breaks GitLab (critical)

Changing /api/v3/ to /api/v1/ fixes Gitea but breaks GitLab (which uses /api/v4/). The current /api/v3/ also doesn't work for Gitea, so the real fix is per-host API version detection.

Options:

  • Preferred: Try the raw URL path first (your new code), then fall back to API with version negotiation (try v1, then v3, then v4)
  • Alternative: Make API version configurable per host in marketplace or auth config

2. Virtual package detection too broad

len(path_segments) > 2 would treat any path with 3+ segments as virtual. For example, gitea.example.com/owner/repo has exactly 2 segments (owner + repo) but gitea.example.com/owner/repo/subdir has 3. The current logic (has_virtual_ext or has_collection) is more precise. Could you check if the issue is specifically that Gitea paths aren't being detected, and narrow the condition?

3. Bare except: pass (line ~1069)

Please catch specific exceptions:

except (requests.RequestException, OSError):
    pass

4. No unit tests

Please add tests for:

  • Gitea raw URL download succeeds
  • GitLab API URL still works (regression test)
  • Virtual package detection for generic hosts

Relationship with PR #584

This PR complements #584 (which fixes the validation/ls-remote path). They don't conflict and can merge independently.

Copy link
Copy Markdown
Collaborator

@danielmeppiel danielmeppiel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per previous comment

@ganesanviji
Copy link
Copy Markdown
Author

Hi @danielmeppiel ,

Thanks for review and I have addressed all the reviewed suggestions,

1. API version change breaks GitLab (critical)

Addressed with the preferred approach. For non-GitHub/GHE hosts we now attempt
the raw URL path first:

https://{host}/{owner}/{repo}/raw/{ref}/{file_path}

If that returns a non-200 we fall through to API version negotiation, trying
v1 -> v3 -> v4 in order. This covers Gitea (v1), legacy Gogs (v3), and
GitLab (v4) without hardcoding anything per host. GitHub and GHE continue to
use their existing code paths unchanged.


2. Virtual package detection too broad

We did not use len(path_segments) > 2. The existing
has_virtual_ext or has_collection guard is kept intact. The only change is
the else branch (no virtual indicator present):

  • GitLab (gitlab.com or any gitlab.* hostname): keeps
    min_base_segments = len(path_segments) -- the full path is the repo,
    preserving nested-group support.
  • All other generic hosts (Gitea, Bitbucket, self-hosted git, etc.): uses
    min_base_segments = 2 -- owner/repo convention, any extra segments are
    treated as a virtual subdirectory path.

The distinction is driven by a new is_gitlab_hostname() helper added to
github_host.py.


3. Bare except: pass

Fixed. The catch at that location is now:

except (requests.RequestException, OSError):
    pass

4. No unit tests

Added in two files:

tests/unit/test_github_host.py -- test_is_gitlab_hostname() covers:

  • gitlab.com and gitlab.* self-hosted instances return True
  • Case-insensitive matching (GITLAB.COM)
  • Negative cases: GitHub, Gitea, Bitbucket, Azure DevOps, None, ""

tests/unit/test_generic_git_urls.py -- TestGiteaVirtualPackageDetection
class covers:

  • Gitea virtual file extension detected as virtual (owner/repo/file.prompt.md)
  • Gitea /collections/ path detected as virtual collection
  • Dict-format virtual package on Gitea host
  • Plain two-segment owner/repo on Gitea is never virtual

TestNestedGroupSupport provides the GitLab regression guard --
gitlab.com/group/subgroup/repo must not be detected as virtual.

@ganesanviji
Copy link
Copy Markdown
Author

@danielmeppiel - Could you please review the changes and update is there any changes or explanation needed on these changes AS SOON AS POSSIBLE. It would be very helpful to include the gitea support in APM in next release to use.

@danielmeppiel danielmeppiel requested a review from Copilot April 9, 2026 04:57
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds broader support for installing virtual packages from non-GitHub Git hosts (with a focus on Gitea), by updating dependency parsing heuristics and expanding the downloader’s raw/API fetching logic, plus new regression tests around hostname classification and generic-host URL handling.

Changes:

  • Add is_gitlab_hostname() and use it during virtual-package detection to treat GitLab nested-group paths as repo paths by default.
  • Extend generic-host downloads with a raw URL attempt and API version “negotiation”.
  • Add unit tests covering GitLab hostname detection, Gitea/generic URL parsing expectations, and generic-host download behavior.
Show a summary per file
File Description
tests/unit/test_github_host.py Adds tests for GitLab hostname detection.
tests/unit/test_generic_git_urls.py Adds Gitea/generic-host virtual package detection regression tests.
tests/test_github_downloader.py Adds tests for generic-host raw download + API version fallback behavior.
src/apm_cli/utils/github_host.py Introduces is_gitlab_hostname() helper.
src/apm_cli/models/dependency/reference.py Adjusts virtual-package detection and standard URL parsing behavior for generic hosts/GitLab.
src/apm_cli/deps/github_downloader.py Adds generic-host raw fetch and API version candidate list changes.

Copilot's findings

  • Files reviewed: 6/6 changed files
  • Comments generated: 5

Comment on lines 578 to 596
elif is_generic_host:
has_virtual_ext = any(
any(seg.endswith(ext) for ext in cls.VIRTUAL_FILE_EXTENSIONS)
for seg in path_segments
)
has_collection = "collections" in path_segments
# GitLab supports nested groups (group/subgroup/repo), so the full
# path is the repo -- no shorthand subdirectory splitting.
# Use https://gitlab.com/group/subgroup/repo.git for GitLab nested
# groups; shorthand subdirectory syntax is not supported for GitLab.
# All other generic hosts (Gitea, Bitbucket, self-hosted, etc.) use
# the owner/repo convention, so extra segments are a virtual subdir.
if has_virtual_ext or has_collection:
min_base_segments = 2
else:
elif is_gitlab_hostname(validated_host):
min_base_segments = len(path_segments)
else:
min_base_segments = 2
else:
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_detect_virtual_package() still treats any generic FQDN path with >2 segments as a virtual package when there are no explicit virtual indicators (file extension or /collections/). That makes refs like gitea.myorg.com/group/subgroup/repo parse as repo_url=group/subgroup + virtual_path=repo, which contradicts the new tests and breaks nested-group repos on non-GitLab hosts. Adjust the generic-host heuristic so that (absent explicit virtual indicators) the full path is treated as the repo (i.e., do not infer a virtual subdir from extra segments), and require dict form (git+path) when users want subdirectory virtual packages on generic hosts.

See below for a potential fix:

            # Generic FQDN hosts may support nested repository paths
            # (for example, group/subgroup/repo). Do not infer a virtual
            # subdirectory from extra segments unless the path has an explicit
            # virtual package indicator. For generic hosts, subdirectory virtual
            # packages should use the structured dict form with separate git+path
            # fields instead of shorthand path splitting.
            if has_virtual_ext or has_collection:
                min_base_segments = 2
            else:
                min_base_segments = len(path_segments)

Copilot uses AI. Check for mistakes.
Comment on lines +1706 to +1732
class TestGitLabApiVersionNegotiation:
"""API version negotiation: v1 -> v3 -> v4 for generic hosts."""

def setup_method(self):
with patch.dict(os.environ, {}, clear=True), _CRED_FILL_PATCH:
self.downloader = GitHubPackageDownloader()

def test_gitlab_v4_reached_after_v1_and_v3_return_404(self):
"""GitLab uses /api/v4/ -- negotiation must try v1, v3, then v4."""
dep_ref = DependencyReference.parse("gitlab.myorg.com/owner/repo")
expected = b"gitlab file content"

side_effects = [
_make_resp(404), # raw URL
_make_resp(404), # v1
_make_resp(404), # v3
_make_resp(200, expected), # v4
]
with patch.object(self.downloader, "_resilient_get", side_effect=side_effects) as mock_get:
result = self.downloader.download_raw_file(dep_ref, "skill.md", "main")

assert result == expected
urls = [c[0][0] for c in mock_get.call_args_list]
assert "/api/v1/" in urls[1]
assert "/api/v3/" in urls[2]
assert "/api/v4/" in urls[3]

Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests assume a GitLab fallback of '/api/v4/repos/{owner}/{repo}/contents/...', but that's not a valid GitLab API shape (GitLab uses /api/v4/projects/.../repository/files...). As written, this test suite will lock in behavior that won't work against real GitLab instances and may hide regressions. Either remove the GitLab framing (treat this as "try v1 then v3 for Gitea/Gogs") or update both implementation and tests to use GitLab's actual endpoints.

Copilot uses AI. Check for mistakes.
Comment on lines +687 to +713
class TestGiteaVirtualPackageDetection:
"""Gitea-specific virtual package detection -- supplements TestFQDNVirtualPaths
and TestNestedGroupSupport with Gitea host fixtures and regression guards
for the len(path_segments) > 2 over-trigger."""

# --- Must NOT be virtual (nested-group repo, no virtual indicators) ---

def test_three_segment_gitea_path_is_not_virtual(self):
"""group/subgroup/repo on Gitea is a nested-group repo, not virtual."""
dep = DependencyReference.parse("gitea.myorg.com/group/subgroup/repo")
assert dep.host == "gitea.myorg.com"
assert dep.repo_url == "group/subgroup/repo"
assert dep.is_virtual is False

def test_two_segment_gitea_path_is_not_virtual(self):
"""Simple owner/repo on a Gitea host is never virtual."""
dep = DependencyReference.parse("gitea.myorg.com/owner/repo")
assert dep.host == "gitea.myorg.com"
assert dep.repo_url == "owner/repo"
assert dep.is_virtual is False

def test_four_segment_generic_path_without_indicators_is_not_virtual(self):
"""Deep nested groups without file extensions or /collections/ are not virtual."""
dep = DependencyReference.parse("git.company.internal/team/skills/brand-guidelines")
assert dep.is_virtual is False
assert dep.repo_url == "team/skills/brand-guidelines"

Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new expectations here (e.g., gitea.myorg.com/group/subgroup/repo and other deep FQDN paths are not virtual unless there are explicit indicators) don't match the current generic-host heuristic in DependencyReference._detect_virtual_package(), which still marks any >2-segment generic-host path as virtual. Unless the parsing logic is updated accordingly, these tests will fail and the feature will still mis-parse nested-group repos on generic hosts.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot apply changes based on this feedback

Comment on lines +584 to +589
# GitLab supports nested groups (group/subgroup/repo), so the full
# path is the repo -- no shorthand subdirectory splitting.
# Use https://gitlab.com/group/subgroup/repo.git for GitLab nested
# groups; shorthand subdirectory syntax is not supported for GitLab.
# All other generic hosts (Gitea, Bitbucket, self-hosted, etc.) use
# the owner/repo convention, so extra segments are a virtual subdir.
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR changes how virtual packages are detected/handled for generic FQDN hosts (and introduces GitLab-specific nested-group behavior). The Starlight docs and the apm-guide usage doc currently document virtual package rules and the "dict form required when shorthand is ambiguous" note, but they don't describe the generic-host behavior being introduced here (e.g., whether subdirectory virtual packages are supported via shorthand on non-GitHub hosts, or require object form). Please update the relevant docs pages so users of Gitea/self-hosted hosts know which syntax is supported and when they must use the object form.

Copilot uses AI. Check for mistakes.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants